Metrics on NFL Receiving Data

CMSC320 Fall 2024 Final Project

By Arina Petrova, Timothy Kramer, Abigail Jenkins, Tariq Abdelkahlek, Zach Bellantoni

Screenshot 2024-12-03 at 9.54.52 PM.png

Contributions:

Arina: Created update GitHub Repository to publish deliverable and host on GitHub pages. Worked on the introduction section, text context/analysis, polishing final EDA(post hoc tests, analysis for EDA), and adding images to the project. (Sections worked on B, C, F, G)

Tariq: Created original GitHub Repository and idea for project (checkpoint 1). Worked on EDA (checkpoint 2), created and worked on topics and ideas for the full layout of the project. Revised introduction section, completed data curation section, and completed conclusion section of final deliverable. (Sections worked on: A, B, C, D, F, G)

Tim: Worked on EDA (checkpoint 2), created and worked on topics and ideas for the full layout of the project. Revised introduction section, completed conclusion section of final deliverable. Layed out steps for members to contribute to project, analysis and explanation of EDA, explanation of ML. (Sections worked on A, B, C, D, F, G)

Abby: Worked on the ML portion of the final deliverable. Used the EDA to develop ML models to further investigate our topic. (Sections worked on D, E, F, G)

Zach: Facilitated formatting throughout the final project, made needed formatting changes and aesthetic updates. Worked on conclusion for final deliverable(B,G)

Introduction

The National Football League (NFL) scouts are always on the lookout for top talent in the field when making decisions about who joins their team. There are thousands of measures for a player’s skill level and performance, so how do they sift through to find the key indicators? The success of a football team depends largely on the skills of individual players, so being able to identify talented players with a high likelihood of success is of utmost importance. Everyone wants to be a Super Bowl winner!

Patrick Mahomes 122222.jpg In this tutorial, we will focus on NFL Receivers, who are players whose role is to catch and run with the ball, to find which factors are most important in predicting their success in the league. We will measure success based on a number of career statistics, such as their number of years in the league, number of yards they can run with the ball, and percentage of receptions of the ball. Ultimately, we hope to display which factors play the biggest role in the success of these incoming NFL receivers.

Helpful Football Terms Glossary

Reception: Player catches the football on a forward-pass

Rookie: First year player

Veteran: A player after the first year (usually three or more years)

Target: When a player receives a pass to them (caught or incomplete pass)

Target_share: The percentage of targets to a player relative to the total targets to everyone

PPR: Used in fantasy football as a point style, stands for point-per-reception.

Data Curation

To start our analysis, we need to find a database storing NFL player data. The philosophy here is in order to find the best future receivers, we must first look at the current best receivers. We want to examine the recent receiving superstars in the NFL and find what kind of factors made them special. Maybe we can find a certain metric or statistic that would help us predict a given player’s success in the game. In order to create different visualizations and hypothesis tests to support our conclusions, we will have to find a dataset of current NFL rosters and seasonal data. The most renowned third-party NFL data tracker are the NFLVerse packages. We can import our needed data through the API provided by the NFLVerse team. We can then create a Pandas dataframe to track these and make any necessary database merges/drops.

First, we must install the needed library - matplotlib - which will allow us to create interesting visualizations of our data to aid in analysis. We are also downloading the NFL dataset in this line.

Next, we need to import the rest of the needed libraries, such as pandas for working with dataframes, pyplot to make various plots, scipy.stats to run hypothesis and other statistical tests.

We import them "as" a shortened nickname to make it easier to refer to each library throughout the code.

The NFL data has multiple dataframes within it, but we only want to extract the ones that will be most interesting for our data analysis. We extract these and assign them to two variables below and print out all of the columns in each dataset so we know what we're working with:

Now, it's time for some data cleanup.

We still have two separate dataframes, but we need only one to perform data analysis easily. We merge them by joining on player_id:

Next, we want to further refine what data we keep. We chose to only keep players that are identified as running backs, receivers, and tight ends, as these are the main positions who receive passing targets.

We also drop some observations that contain null values, and in the end we select only the columns that will be useful to us.

In addition, we want to create a new feature which will be helpful for our analysis. Currently we do not have a catch percentage from players. We will accomplish this by creating a new catch rate feature by diving player receptions by their targets.

What a comeback—this data is ready for exploration!

Exploratory Data Analysis

In football, one of the biggest factors that classify players is their position, which determines where they line up on the field and their role in the offense. This heavily influences how often they receive the ball and is affected by factors like the game script and remaining time.

When analyzing players' receiving stats, it’s essential to consider their position and role within the team. Wide receivers (WRs) are usually the primary targets for deeper passes and complex routes, while tight ends (TEs) and running backs (RBs) often have specialized roles, such as blocking, short-yardage receiving, or serving as safety-valve options for shorter gains.

This dynamic can shift based on game situations, offensive strategies, or individual player skill sets. To analyze this, we will use an ANOVA (Analysis of Variances, aka a T-test for more than two groups) test to examine differences in average positional target share. This analysis can reveal which positions tend to have higher target shares, a key factor in predicting players' season totals. Understanding these positional trends is crucial for evaluating a player's receiving potential over the course of a season.

Since this tiny p-value is smaller than the typical 0.05 alpha level, we have found statistically significant evidence that the position a player has will affect the rate they receive a ball. We are able to reject the null hypothesis as there is a correlation between the two factors. Lets visualize this result using a boxplot so that it is easier to understand:

Clearly, there are some visible differences between the different positions target shares. Given that we rejected the null hypothesis of our ANOVA test, we want to apply a Post-Hoc test to learn more about the differences between each position. We will use Tukey's Honestly Significant Difference (HSD) test, which is designed to identify where the differences lie between group means.

Looking at the p-adj column for each row comparing the different positions, we see that all of the groups have significant differences between their respective means.

Using these two tests, we confirmed our belief that the position of a player that can potentially catches the ball will significantly impact how many balls they are able to catch.


Another EDA angle:

A common stereotype in football is that rookies tend to "play it safe," avoiding contact compared to their seasoned counterparts. This suggests that rookies might record fewer Yards After Catch (YAC)—the distance a player covers after catching the ball—possibly due to hesitation or inexperience. This perception of rookies often fades as players gain experience. To evaluate this belief, we will conduct a two-sample t-test to compare the average YAC between rookies and veteran players. This analysis will help determine whether experience significantly impacts how players handle receiving opportunities and contact.

Hmmmm, this test produced a different result than the ANOVA test, as we did not find statistically significant evidence that there was any difference in the Yards After Catch between new players and experienced players. We failed to reject our null hypothesis, and we cannot prove from this test alone the correlation. However, this is an interesting conclusion, as it goes against the commonly held belief that more experienced players have higher yards after catch numbers. What a sigh of relief for Rookies!

image.png

Experience vs Recieving Yards

The relationship between experience and receiving yards is not straightforward, being influenced by numerous factors, such as athleticism, offensive schemes, team dynamics, competition, and much more. By examining the correlation between a player's years of experience and their seasonal receiving yards, we aim to get a base correlation despite the many confounding variables, to potentially highlight contradicting trends Does more experience consistently lead to better performance, or do external factors outweigh the influence of tenure? While this analysis may not yield definitive conclusions, it could highlight trends that require deeper analysis.

Based on our r coefficient of 0.0625, We see that there is not a strong correlation between the two as the r value is close to 0. Other factors, such as target share, team role, position, or offensive scheme, may play a more significant role in influencing receiving performance, which we will look to examine.

Here is a scatter plot showcasing our findings:

As you can see, there is not a super strong and clear correlation between the varaibles. In order to draw a concluson we will need further analysis.

Catch rate vs Target Share

Catch rate —the ratio of catches vs the number of passes thrown their way - and target share - the proportion of a team's total pass attempts directed at a player - are two crucial statistics for receiving players in the NFL. These statistics highlight their role in the offense, through analysis, We can examine players who are more frequently targeted and what the relationship between target share is. A high target share might indicate trust from the quarterback, but could also lead to more challenging situations, like facing double coverage or receiving off-target throws. Conversely, players with a lower target share may only be thrown to in high-probability situations, inflating their catch percentage. By analyzing this correlation, we can uncover whether there's a trade-off between opportunity and efficiency or if consistent performers thrive regardless of how much they are relied upon in the passing game. This analysis is important not just for evaluating individual players, but also for understanding how teams might optimize their passing strategies.

The r-value of -0.050 indicates a very weak negative correlation between target share and catch rate. This suggests that as a player's target share increases, their catch rate slightly decreases. However, the relationship is extremely weak, meaning target share is not a significant predictor of catch rate based on this data.

ML and Primary Analysis

First we preprocess the data by setting up feature and label dataframes and dropping na values. We convert positions to one hot encoding, scale the features using minmax scaling, and split into training and testing datasets with 20% of the data being testing. We do this using the sklearn model for splitting data

Then we must fit and score the model. For linear regression, the primary metric used to score the model is the R2 score, which is the proportion of variance predicted by the independent variables, in this case being the chosen features. This score typically ranges between 0 and 1, with 0 being low and 1 being high.

A score of 0.9195 is very high being very close to 1, indicating that the model explains a very high proportion of the variance in the number of receiving yards based on the chosen features.

The model prioritizes important features when predicting an output by giving them a higher weight. We can look at which features are most important by their coefficients. We do this by using feature importance, which is a built in feature that looks at which variables of the training are the most impactful.

Visualization(s) of Data

We can visualize data by looking at the predicted data versus the actual data. The first plot compares the two values, with perfectly predicted data falling on the dotted line. The second plot is graphing the residual values, with the third plot showing their distribution. We can see that they are clustered around the center, which means that our data is more commonly closer to the actual value.

Conclusion

After reviewing the analysis and insights from the graphs we are able to gain a deeper understanding of receiving in the NFL. The strongest correlation was between targets and yards, shown in our feature importance and in the correlations. We also can see the different levels of impact that the different factors have, with catch rate having more importance in both our model and our exploratory analysis.

An uninformed reader would gain a solid understanding of the key metrics used to evaluate NFL receivers and their performance. These metrics include target share, catch percentage, and years of experience, which are calculated utilizing the NFLVerse package. The project’s use of ANOVA tests and correlation analysis help break down these concepts and make them understandable for someone new to football analytics. The use of real world examples and NFL knowledge of the authors clearly demonstrate how the metrics directly relate to player success on the field. For those already familiar with the topic, the project intertwines machine learning with NFL statistics to highlight the most influential factors in predicting said statistics.

Overall, this project combines NFL knowledge and machine learning to create an informative analysis of NFL statistics for readers. It is our goal that following along with the project alows you to gain insights not only into the NFL, but also into how a typical data science project works. The timeline and purpose in each of the different components is crucial in contributing to the projects effectiveness as a whole.

That's the final whistle—catch you next time! 🏈